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Abstract — An extended true-single-phase-clock (E-TSPC) based 
divide-by-2/3 counter design for low supply voltage and low power con- 
sumption applications is presented. By using a wired or scheme; only 
one transistor is needed to implement both the counting logic and the 
mode selection control. This can enhance the working frequency of the 
counter due to a reduced critical path between the E-TSPC flip flops (FFs). 
Since the number of transistor stacking between the power rails is kept 
at merely two, the proposed design is sustainable to low Vj M i operations 
(531 MHz at 0.6 V Vj ii i ) for the power saving purpose. Simulation results 
show that compared with two classic E-TSPC based designs in 0.18 \x m 
process technology, as much as 16.4% in operation speed and 39% in 
power-delay-product can be achieved by the proposed design. 

Index Terms — Extended true-single-phase-clock flip flops (E-TSPC FF), 
low power, low voltage, prescaler. 


I. Introduction 

High speed divide -by-ZV-iV + I counter (also called prescaler) is a 
fundamental module for frequency synthesizers. Its design is crucial 
because it operates at a higher frequency and consumes higher power 
consumption. A divide -by-A r - N + I counter consists of flip-flops (FF) 
and extra logic, which determines the terminal count. Conventional 
high speed FF based divide -by —N-N + I counter designs use cur- 
rent-mode logic (CML) latches [1] and suffer from the disadvantage 
of large load capacitance. This not only limits the maximum oper- 
ating frequency and current-drive capabilities, but also increases the 
total power consumption. Alternatively, FF based divide-by-iY-A 7 + 
I designs adopt dynamic logic FFs such as true-single-phase clock 
(TSPC) [2]-[4]. The designs can be further enhanced by using extended 
true-single-phase-clock (E-TSPC) FFs for high speed and low power 
applications [5]— [10]. E-TSPC designs remove the transistor stacked 
structure so that all the transistors are free of the body effect. They are 
thus more sustainable for high operating frequency operations in the 
face of low voltage supply. 

Past optimization efforts on prescaler designs focused on simpli- 
fying the logic part to reduce the circuit complexity and the critical 
path delay. For example, an E-TSPC design embedded with one extra 
pMOS/nMOS transistor can form an integrated function of FF and 
and/or logic [7]. Moving part of the control logic to the first FF to 
reduce unnecessary FF toggling yields another version of prescaler de- 
sign [8]. These two classic designs each contains 16 transistors only 
and the mode control logic uses as few as 4 transistors. To achieve 
such circuit simplicity, it calls for a ratioed structure in the FF design. 
Despite its distinct speed performance, the incurred static and short 
circuit power consumptions are significant. Latest designs presented in 
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[10] adopt a general TSPC logic family containing both ratioed and 
ratioless inverter alternatives. Since the maximum height of transistor 
stacking is up to 5, these designs lose their performance advantages 
when working under a low scenario. In [1 1 ], a power gating tech- 
nique by inserting an extra pMOS between and the FF is employed 
in two novel divide -by-2/3 counter designs. The unused FF can be shut 
down when working in the divide -by-2 mode. Due to the increase in 
the number of transistor stacking (up to 4), these designs are not suit- 
able for low operations. 

Due to the quadratic dependence of power consumption on supply 
voltage, lowering is a very effective measure to reduce the power 
at the expense of speed performance. In this paper, a prescaler circuit 
design aimed at tackling the speed and power issues simultaneously 
using non-state-of-the-art process technology (0.18 m) is presented. 
In particular, we focus on low operations for power saving without 
sacrificing the speed performance. In this design, ratioed E-TSPC FFs 
are employed due to its circuit simplicity and speed performance. Only 
one pass transistor is needed to implement the mode control logic. The 
proposed design is capable of working at a maximum frequency of 53 1 
MHz when the supply voltage is as low as 0.6 V. 

II. Conventional E-TSPC-Based Divide-by- 2/3 

Counter Designs 

A state-of-the-art divide -by-2/3 counter design is given in Fig. 1(a) 
[7]. It contains two E-TSPC-based FFs and two logic gates i.e., an or 
gate and an and gate. When the divide control signal is "0", the 
or gate (merged into output of FF1 design) is disabled. The state of 
cycles through 11, 01, and 00. This corresponds to a di- 
vide -by-3 function. Note that state 10 is a forbidden state. If, somehow, 
the circuit enters this state, the next state will go back to a valid state, 
1 1 , automatically. When is " 1 ", the output of FF 1 will be disabled 


and FF2 alone performs the divide -by-2 function. Since the input to 
FF1 is not disabled, FF1 toggles as usual and causes redundant power 
consumption in the divide -by-2 mode operation. 

To overcome this problem, another divide -by-2/3 counter design 
presented in [8] is shown in Fig. 1(b). B^^phing the divide control 
logic from the outpu t of FF1 to its input, the output of the first stage in 
FF1 is frozen when DC . This refrains the following stages from 
any switching activities for the purpose of power saving. The first 
stage itself, ljowever, encounters larger power consumption than its 
counterpart in design [7]. This is because the pull up path is turned on 
all the time and th^/s|jiprt circuit current is drawn repetitively whenever 
the clock signal turns "1". The critical path delay, formed by the two 
FFs and the control logic, is the dominant factor of the prescaler 's 
maximum operating frequency. In spite of the circuit simplicity in 
designs [7] and [8], the inverter ^etween FF1 and FF2, which is es- 
sential to the logic of divide-by-3, causes extra delay. Merging control 
logic with FF designs also introduces parallel connected transistors 
leading to larger parasitic capacitance adverse to both speed and 
power consumption. In view of these issues, our approach is keeping 
the circuit simplicity so that the delay and the power consumption 
problems can be improved at a time. 

III. Proposed Divide-by-2/3 Counter Design 

The logic structure of the proposed design is shown in Fig. 2. The 
two FFs and the and gate are common in previous designs. The OR 
gate for the divide control is replaced with a switch. No^^at there is a 
negation bubble at one of the AND gate's input. The outputQ of FF 1 is 
%Jif^o^nplemented before being fed to FF2. When the switch is open, 
the input from FF1 is disconnected and FF2 alone divides the clock fre- 
quency by 2. When the switch is close, similar to the design in [7], FF1 

DC- 
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Fig. 1. Previous E-TSPC -based divide -by-2/3 counter designs, (a) Design [7]. (b) Design [8]. 
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Fig. 2. Logic structure of proposed divide -by-2/3 counter design. 
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Fig. 3. MOS schematic of proposed E-TSPC -based divide -by-2/3 counter design with pass transistor l^tc circuit technique. 


and FF2 are linked to form a counter with three distinct states. Fig. 3 
shows the circuit implementation. According to the simulation results 
given in [12], E-TSPC design shows the best speed performance in var- 
ious counter designs including the one using conventional transmission 
gate FFs. Besides the speed advantage, E-TSPC FFs are particularly 
useful for low voltage operations because of the minimum height in 
Sansisto? stacking. Other than the two E-TSPC FFs, only one pMOS 
transistor ~_P\\\ ~ is needed. The pMOS transistor controlled by the di- 
vide control signal serves as the switch. The and gate plus its input 
inverter are achieved byway of wired- and logic using no extra transis- 
tors at all. The proposed design scheme is far more sophisticated than 
the measure of simply adding one pass transistor may suggest. First 
of all, unlike any previous designs, the E-TSPC FF design remains in 
tact without any logic embedding. Both speed and power behaviors are 
not affected, which indicates a performance edge over the logic em- 
bedded FF design. Secondly, the inverter to complement the one of 
the two E-TSPC FF outputs for divide-by-3 operations is removed in 
the proposed design. The circuit simplification, again, suggests the im- 
provements in both speed and power performances. The working prin- 
ciple of the proposed design is elaborated as follows. When DG is "1", 


Qtbe pMOS transi^fr is turned off as a switch should behave. A 
single pMOS transistor, howevergiresents a smaller capacitive load to 
FF1 than an inverter does in design [7]. W^n is "0", the output 
of FF1, , is tied with the ou^jut node of the 1st stage in^rter of 
FF2 through the pMOS transistor. In an E-TSPC FF design, the output 
of the first stage inverter can be regarded complementary to the input 
, i.e., . Therefore, a wired-OR logic is in fact implemented. Either 
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being "0" or being "1" pulls the output node of the inverter 
high. This means . By applying Demorgan's law " Ql + 

to the Boolean equation gives rise to , which DZ~ Ql" ■ Q=" 

is exactly the desired logic. Since is applied to the input of , the 

inverter needed to complement the signal can be eliminated. Ql" 

Before elaborating on the functional correctness of this wired or 
logic, the working principle of the E-TSPC FF design is briefly re- 
viewed. An E-TSPC FF consists of two pseudo pMOS inverters fol- 
lowed by a D-latch. When clock signal equals to 1 , the outputs of the i = 
two inverters are pre-discharged to zero. In the mean time, the pMOS 
and nMOS transistors of the D-latch (the third inverter) are both turned 
off so that the output value holds via the parasitic capacitance. When 
clock signal turns to 0, the first two inverters enter the evaluation i = 
phase and the D-latch becomes a pseudo nMOS inverter to admit the 
evaluation result from the preceding inverter. 

Table I shows the state transition table and the excitation logic of 

when working in the divide -by-3 mode. The wired-OR lQI~, Q=~I 
function is implemented by connecting the output node of FF1 lQl~~ 
and the output node of the 1st stage inverter in FF2 through a pMOS 
transistor. Any signal inconsistence between these two nodes must be 
resolved by way of logic or. In other words, signal "1" must override 
signal "0". In Table I, there exist two cases of such signal inconsistence. 

Case 1 occurs when both and are equal to "1". When , Ql" Q=~ I 

node is both driven low by a pull down nMOS transistor N2 and Z?=" 

pulled high by through pMOS transistor . Note that is Ql" Pi n Ql" 

actually a weak "1" retaining its level via parasitic capacitance. 
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TABLE I 

State Transition Divide-by-3 Operations. 
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Fig. 4. Simulation waveforms for wired or logic in divide-by-3 operations. 


Although signal "0" seems to override signal "1", this will not af- 
fect the correct value, i.e., "1", t<Pb& latched in the evaluation phase. 
First, even thotQinQl" is vulnerable to the discharge by transistor 
N2, the discharge hazard is alleviated by the threshold voltage drop 
across pMOS transistor P\ \\ .In particular, for low Vi n i operations, 
the threshold voltage, Qtarged further by the body effect, can well ex- 
ceed onSKalf of the V\\\\. This can be shown in Fig. 4 that the level 
degradation of Ql~ is mild. Second, when the FF enters the evalua- 
tion phase, the pull down transistor N2 is cut off while transistor PI is 
turned on and charges Z?=" through transistor P\ \\ . Refer to the D=Z 
waveform shown in Fig. 4, the voltage level is raised up to 210 mV 
only. Although enlarging transistor PI can boost the level of _D=~, it 
will degrade the speed performance as well due to a larger capacitive 
load. Via proper transistor size tweaking in the following stages, this 
level is good enough for a correct "1" to be latched at QtZ when i = 
turns "1" again. A close exam at the waveform of node i reveals that, 
in spite of a signal level over one half of the VI n i , it is not high enough 
to drive the output node QlZ to an erroneous state "0". 

The second case of signal inconsistence occurs when both QW1 and 
Q=" are equal to "0". In the hold phase (when i = I), node _D=" 


is always pulled low by transistor N2. In the evaluation phase, tran- 
sistor N2 is turned off and node is pulled high via transistor P2. 
Although node keeps a contradictory signal "0", it is actually a 
weak "0" and will not affect the rising of node D2b. In particular, tran- 
sistor P2 charges through transistor as well, which coincides 
with the next state of . In Fig. 4, we can see a steep 0 to 1 transi- 
tion of due to this effect. This enhances the prescaler's working 
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frequency. Also indicated in Fig. 4 are the widths of the transistors. A 
minimum channel length, which is 0. 1 8 m, is adopted in all transistor 
designs. Although deliberate transistor sizing is required to ensure the 
functionalities of wired-OR and E-TSPC, both FFs share identical sizes 
to reduce the design complexity. 

IV. Simulation Results and Performance Comparisons 

Post-layout simulations in HSPICE are conducted to compare the 
performances between the proposed design and the two divide -by- 2/3 
counter designs shown in Fig. 1, which are considered two of the best 
prior arts. Since the same type of E-TSPC FF is used in all three de- 
signs, any performance discrepancy would come from the logic struc- 
ture. However, designs in [10] and [11] are excluded as their stacked 
logic structures significantly degrade their speed performance when 

working in the territory of low . The target technology is TSMC V\\\\ 
0. 18 m 1P/6M CMOS process. Transistor sizing is subject to the op- ju- 
timization of power-delay-product and the capability of functioning 

properly at 0.6 V .A typical-size inverter, i.e., is used V\\\\ l.z fi-Z.z jn 

as the output load of node . Designs [7] and [8] are remapped to QZ~ 

the same process technology and optimized using the same criteria. 

Refer to the transistor sizes shown in Fig. 3, the two E-TSPC FFs are 

identical to reduce the efforts of size tweaking. The two pseudo pMOS 

inverters in the E-TSPC FF design are sized to assure a logic "0" can 

be recognized when both pull-up and pull-down transistors are turned 

on. However, the size of the pull-down transistor N2 is deliberately set 

smaller to reduce the adverse feedback effect to the stored charge in 

the first FF. The pull up transistor P2 at the first stage of FF2 is also 


Low Voltage and Low Power Divide-By-2/3 Counter Design Using Pass Transistor Logic Circuit Technique Page 999 

V\ ii i 



^ International Journal of Research (IJR) Vol-1, lssue-4, May 2014 ISSN 2348-6848 


enlarged to resolve the signal inconsistence of "case 2" as described 
in Section III. The third (or output) stage in an E-TSPC FF is actually 
a latch and equal sized P- and N-type MOS transistors are employed. 
The setup time, hold time, D-to-Q and C-to-Q delays of our E-TSPC 
FF are -49.7, 792.6, 801.3, and 925.6 ps, respectively. The sizing of the 
pass transistor P\\\ is not as critical as its pivot location suggests. The 
layout of the proposed prescaler design is shown in Fig. 5. 

Table II summarizes the design features of these divide -by-2/3 
counter designs at 0.6 V supply voltage. The two numbers separated 
by a slash in the "transistor count" indicate the number of transistors 
needed for the entire circuit versus that needed for the extra logic gates. 
The layout area of the proposed design is 21.3% and 28.6% smaller 
compared to design [7] and design [8], respectively. The maximum 
operation speed is 16.4% and 11.7% faster in divide -by-3 operations. 
The numbers are 11.8% and 13% in divide -by-2 operations. It is noted 
that, in spite of operating at a higher frequency, the proposed design 
consumes even less power than the other two designs. Due to the 
circuit simplicity, the frequency jitter of the proposed design is also 
smaller than the other two designs. Although the numbers compiled 
in Table II is under the condition of VDD 0.6 V, the proposed 
design exhibits a consistent advantages in speed and power through 
out other voltage settings. Since we focus on low V\\\ \ operations, the 
voltage range of simulations is between 0.6 and 0.9 V in contrast to 
the nominal 1.8 V used in 0.18-/!.m technology. Table III show the 
results of maximum working frequency versus supply voltage. The 
speed advantage of the proposed design is maintained in all voltage 
settings. The clock rate can be up to 2.98 GHz when V\\\\ reaches 0.9 
V. Fig. 6 shows the simulation results of power-delay-product (PDP), 
a compound performance of power and speed, versus supply voltage. 
The PDP value is measured at the point of the maximum working 
frequency for each voltage setting and can be regarded as normalized 
power consumption or the energy consumption per clock cycle. The 
PDP curves of the proposed design are well below those of the other 
two designs in both operation modes. Compared to design [7], the 
maximum PDP saving is up to 39%. The PDP values for Vim equal to 
0.6 V are also listed in Table II. Fig. 7 shows the PDP performance of 
these designs at different process and temperature corners. The voltage 
is fixed at the lowest V| n i 0.6 V and the temperature varies from 0 
1 1 C (FF corner), 25 1 1 C (SF,FS,TT corners) to 100 1 1 C (SS corner). The 
simulation is to show the design robustness against PT variations. The 
performance edge of the proposed design is maintained in all corners. 

Besides corner simulations, Monte Carlo simulations were also con- 
ducted and the proposed design does not exhibit any significant perfor- 
mance variations when compared with the other two designs. Due to 
space limitation, the simulation plot is not shown in this paper. Fig. 8 
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Fig. 7. Comparison of power-delay-product values in different process corners, 
(a) Divide -by-2. (b) Divide -by-3. 


depicts the post-layout simulation waveforms of the proposed design 

for 0.6 V operations. The waveforms are taken at node loaded VI n i Q=~ 

with a typical size inverter. The waveforms show an intact signal "0" 
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Fig. 8. Simulation waveforms of the proposed design. 


TABLE II 

Feature Comparison of Various Divide-by-2/3 Counter Designs 


2/3 Counter 

Design [7] 

Design [8] 

Proposed 

# of Transistor-Count 

16/4 

16/4 

13/1 

Layout Area (urn 2 ) 

91.51 

100.85 

71.98 

Max. Freq. (MHz) +2 1 -f 3 

475 / 45 1 

470/470 

531 /525 

Average Power (uW) 

6.38 / 5,92 

5.74 / 5.24 

4.35/4.61 

Power-Delay-Product (fJ) 

13.43 / 
13.13 

12.21/11.15 

8.19/8.78 

Jitter (ps) 

13.7 

10.01 

7.02 


TABLE III 

Maximum Operating Frequency Versus Supply Voltage (GHz) 


Counter designs 

0.6V 

0.7V 

0.8V 

0.9V 

Design [7] 

0.45 

1.14 

1.93 

2.73 

Design [8] 

0.47 

1.17 

1.98 

2.79 

Proposed 

0.52 

1.24 

2.10 

2.98 


and the minimum level of signal "1" is 0.46 V, which is large enough 
to turn off the pull up pMOS transistor in an E-TSPC FF. 


V. Conclusion and Discussion 

In conclusion, a novel low voltage, low power divide-by-2/3 counter 
design suitable for high speed DLL applications is presented. The pro- 
posed design successfully simplifies the control logic and one pMOS 
transistor alone serves the purposes of both mode select and counter ex- 
citation logic. The circuit simplicity leads to a shorter critical path and 
reduced power consumption. Post layout simulation results proved its 
advantages in power, speed, and layout area against previous designs. 
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